skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Li, Xiaoman"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract Enhancer-promoter interactions (EPIs) are fundamental to gene regulation, and understanding their recurrence across diverse biological samples is key to deciphering chromatin architecture. In this study, we systematically analyzed the recurrence of EPIs across 49 Hi-C and 95 HiChIP datasets. We found that the majority of EPIs identified in a given sample were also present in other samples, regardless of the assay type (Hi-C or HiChIP) or the enhancer annotations used. Interestingly, EPIs that appeared unique to individual samples were typically surrounded by fewer neighboring EPIs, suggesting they may not represent truly sample-specific interactions. Our findings indicate that most human EPIs have already been captured and that cells primarily reuse subsets of these shared EPIs across different cell types and conditions. This study provides new insights into the pervasive and reusable nature of EPIs in the human genome, with important implications for chromatin conformation studies. 
    more » « less
    Free, publicly-accessible full text available September 29, 2026
  2. Studying miRNA activity at the single-cell level presents a significant challenge due to the limitations of existing single-cell technologies in capturing miRNAs. To address this, we introduce two deep learning models: Cross-modality (CM) and single-modality (SM), both based on encoder-decoder architectures. These models predict miRNA expression at both bulk and single-cell levels using mRNA data. We evaluated the performance of CM and SM against the state-of-the-art miRSCAPE approach, using both bulk and single-cell datasets. Our results demonstrate that both CM and SM outperform miRSCAPE in accuracy. Furthermore, incorporating miRNA target information substantially enhanced performance compared to models that utilized all genes. These models provide powerful tools for predicting miRNA expression from single-cell mRNA data. 
    more » « less
    Free, publicly-accessible full text available June 1, 2026
  3. Abstract MotivationExtracellular miRNAs (exmiRs) and intracellular mRNAs both can serve as promising biomarkers and therapeutic targets for various diseases. However, exmiR expression data is often noisy, and obtaining intracellular mRNA expression data usually involves intrusive procedures. To gain valuable insights into disease mechanisms, it is thus essential to improve the quality of exmiR expression data and develop noninvasive methods for assessing intracellular mRNA expression. ResultsWe developed CrossPred, a deep-learning multi-encoder model for the cross-prediction of exmiRs and mRNAs. Utilizing contrastive learning, we created a shared embedding space to integrate exmiRs and mRNAs. This shared embedding was then used to predict intracellular mRNA expression from noisy exmiR data and to predict exmiR expression from intracellular mRNA data. We evaluated CrossPred on three types of cancers and assessed its effectiveness in predicting the expression levels of exmiRs and mRNAs. CrossPred outperformed the baseline encoder-decoder model, exmiR or mRNA-based models, and variational autoencoder models. Moreover, the integration of exmiR and mRNA data uncovered important exmiRs and mRNAs associated with cancer. Our study offers new insights into the bidirectional relationship between mRNAs and exmiRs. Availability and implementationThe datasets and tool are available at https://doi.org/10.5281/zenodo.13891508. 
    more » « less
  4. Small Proteins (SPs) are pivotal in various cellular functions such as immunity, defense, and communication. Despite their significance, identifying them is still in its infancy. Existing computational tools are tailored to specific eukaryotic species, leaving only a few options for SP identification in prokaryotes. In addition, these existing tools still have suboptimal performance in SP identification. To fill this gap, we introduce PSPI, a deep learning-based approach designed specifically for predicting prokaryotic SPs. We showed that PSPI had a high accuracy in predicting generalized sets of prokaryotic SPs and sets specific to the human metagenome. Compared with three existing tools, PSPI was faster and showed greater precision, sensitivity, and specificity not only for prokaryotic SPs but also for eukaryotic ones. We also observed that the incorporation of (n,k)-mers greatly enhances the performance of PSPI, suggesting that many SPs may contain short linear motifs. The PSPI tool, which is freely available athttps://www.cs.ucf.edu/∼xiaoman/tools/PSPI/, will be useful for studying SPs as a tool for identifying prokaryotic SPs and it can be trained to identify other types of SPs as well. 
    more » « less
  5. Abstract Small proteins (SPs) are typically characterized as eukaryotic proteins shorter than 100 amino acids and prokaryotic proteins shorter than 50 amino acids. Historically, they were disregarded because of the arbitrary size thresholds to define proteins. However, recent research has revealed the existence of many SPs and their crucial roles. Despite this, the identification of SPs and the elucidation of their functions are still in their infancy. To pave the way for future SP studies, we briefly introduce the limitations and advancements in experimental techniques for SP identification. We then provide an overview of available computational tools for SP identification, their constraints, and their evaluation. Additionally, we highlight existing resources for SP research. This survey aims to initiate further exploration into SPs and encourage the development of more sophisticated computational tools for SP identification in prokaryotes and microbiomes. 
    more » « less
  6. Helmer-Citterich, Manuela (Ed.)
    MicroRNAs (miRNAs) play crucial roles in gene regulation. Most studies focus on mature miRNAs, which leaves many unknowns about primary miRNAs (pri-miRNAs). To fill the gap, we attempted to model the expression of pri-miRNAs in 1829 primary cell types, cell lines, and tissues in this study. We demonstrated that the expression of pri-miRNAs can be modeled well by the expression of specific sets of mRNAs, which we termed their associated mRNAs. These associated mRNAs differ from their corresponding target mRNAs and are enriched with specific functions. Most associated mRNAs of a miRNA are shared across conditions, while on average, about one-fifth of the associated mRNAs are condition-specific. Our study shed new light on understanding miRNA biogenesis and general gene transcriptional regulation. 
    more » « less
  7. El_Allali, Achraf (Ed.)
    With mutations constantly accumulating in bacterial genomes, it is unclear whether the previously identified bacterial strains are really present in an extant sample. To address this question, we did a case study on the known strains of the bacterial speciesS.aureusandS.epidermisin 68 atopic dermatitis shotgun metagenomic samples. We evaluated the likelihood of the presence of all sixteen known strains predicted in the original study and by two popular tools in this study. We found that even with the same tool, only two known strains were predicted by the original study and this study. Moreover, none of the sixteen known strains was likely present in these 68 samples. Our study thus indicates the limitation of the known-strain-based studies, especially those on rapidly evolving bacterial species. It implies the unlikely presence of the previously identified known strains in a current environmental sample. It also called for de novo bacterial strain identification directly from shotgun metagenomic reads. 
    more » « less
  8. Abstract Integrating single-cell multi-omics data is a challenging task that has led to new insights into complex cellular systems. Various computational methods have been proposed to effectively integrate these rapidly accumulating datasets, including deep learning. However, despite the proven success of deep learning in integrating multi-omics data and its better performance over classical computational methods, there has been no systematic study of its application to single-cell multi-omics data integration. To fill this gap, we conducted a literature review to explore the use of multimodal deep learning techniques in single-cell multi-omics data integration, taking into account recent studies from multiple perspectives. Specifically, we first summarized different modalities found in single-cell multi-omics data. We then reviewed current deep learning techniques for processing multimodal data and categorized deep learning-based integration methods for single-cell multi-omics data according to data modality, deep learning architecture, fusion strategy, key tasks and downstream analysis. Finally, we provided insights into using these deep learning models to integrate multi-omics data and better understand single-cell biological mechanisms. 
    more » « less